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ABSTRACT 



This thesis addresses the problem of small user groups 
being forced to use input data collected and processel by 
sources outside their span of control. Specifically, the 
use of an active data dictionary to locally validate such 
input data is examined. The thesis proceeds from a general 
review of data validation techniques and criteria, through 
an examination of data dictionaries, to an illustration cf 
how an active data dictionary can be configure! to act as a 
"data filter" for input data. 



Key initial planning and design steps are set forth, 
including requirements analysis, data definition, ana 
initial logical design. A checklist of questions to answer 
during each of these activities is included. 



The concepts discussed in the paper are then applied to 
a specific case (DCSPLANS Branch, O.S. Army Military 
Personnel Center, Alexandria, VA) resulting in a "data 
filter" structure diagram that is tailored to the DCSPLANS* 
environment and their unique validation needs. 
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I. INTRODUCTION 



A. CONTROL OF DATA 

One problem plaguing today's information manager is tne 
serious lack of control over data which has developed as 
computers and their applications Lave spread throughout 
organizations. Recently, there has been a considerable 
increase in the attention being paid to this problem.. 
However, most organizations whose information systems were 
developed in the 60' s and early to middle 70's still surfer 
the ill effects cf improperly controlled d.ata. In these 
environments, redundant, incomplete, and inaccurate data are 
still prevalent. Under such circumstances, the probability 
that faulty data will directly contribute to poor 
organizational planning and ineffective decision-making is 
significantly increased. 

while seme organizations have undertaken action to 
correct their data control problems, many others are 
overwhelmed by the enormity, complexity, and cost, of the 
task. In very large organizations, the cost and complexity 
take cn proportions that appear extremely prohibitive. 
Unfortunately, it is these large organizations which have 
the greatest need for carefully controlled data. Large 
organizations are also more likely to experience adverse- 
effects which extend beyond these found in smaller 
enterprises. 

One of these effects is manifest in the helpless 
position in which seme organizational user groups find 
themselves. As one cog in a large wheel, these groups often 
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are Merced to use data collected and processed by other 
organizational elements over vhcra they exercise no control. 

A serious danger in this circumstance is the receipt anc 
subsequent use of inaccurate lata. 

Information systems need valid data to be effective! A 
rash assumption by a data processing element that inaccurate 
data arc correct can have devastating effects on a parent 
organization, especially if information based on the lata is 
used for strategic p lanning/dec ision-nak in g. 

When input data of unknown vaiility is being transferee, 
among data processing elements within an o rganiz i ti on , Liic 
problem is almost always a systemic one with deep and 
widespread roots. Corrective action on an organization-’, i ie 
basis often is neglected because of excessive costs. Users 
who find themselves in these situations are frequently left 
to their own devices, and they aust levciop their own meins 
for validating inputs. An illustration of a user group 
experiencing such a situation is the Office of the Deputy 
Chief of Staff, Plans (DCSPLAU3), U.S. Army Military 
Personnel Center ( MTLPEP.CEN) , in Alexandria, Virginia. 



B. DCSPLAMS, MILPERCEN 

U.S. Amy KILPEFCEN is responsible for the worldwide 
distribution anl professional development of army officer 
and enlisted personnel. Within UiLPEI'CZ'J, DC SUL A US has the 
mission of planning, programming, an i executing current uni 
future force alignment, i.e., matching personnel inventory 
to force authorization levels. 

DC5FLANS is composed of five branches, each of which 
monitors a specified portion of the force alignment mission. 
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Figure 1- 1 Force Plans Branch 



Each branch uses a series of ccaputerize! mole Is to perform 
a variety of forecasting functions. See Figure 1.1 fer an 
example cf the models ani input files use! by DCS PL AN 3 ’ 
tranches. Many of these models are quite complex and draw 



input data from both MILPERCEH and non-MILPERCEN sources. 
Some input files are extremely large, feed a number of 
models, and historically, have been prone to error. Hone of 
the input files are under DCSPLAMS control. 

The output of DCSfLAHS' models is used for crucial top 
level decision making which will determine the structure and 
content cf army forces well into the future. As such, f iic 
DCSPLAMS output must exhibit a very high degree of validity. 
Currently, however, DCSPLAMS is unable to verify the 
accuracy of much of the input data being used by its models. 
Thus, despite the correctness of the models themselves, the 
reliability of the DCSPLAMS product must be considered 
do ubt f ul . 



DCSPLAMS officials are quite concerned about their 
present inability to insure that the lata used in their 
models are accurate. They realize the problem will r.ot bo 
solved for them soon by the organization ( MILt’EF CZN) , ana 
that they must devise their own local solution. A variety 
cf options are available to their. Some are guite poor 
(e.g., maintain the status quo and rely on the inpur data 
sources to insure validity) ; others are more feasible, but 
still contain serious shortcomings (e. j. , update/convert 
every DCSPLAMS model to include its own validation r rocess) . 
A much more effective an 1 efficient alternative is describe 1 
in this paper, i.e. , the use of an active data dictionary as 
a "filter" to validate input before the data is processed by 
the various models. 



C. THESIS METHODOLOGY 

This thesis will explore the concept of using an active 
data dictionary as a local validation tool. It will proceed 



from a general review of data validation, through ar. 
exa. a i nation of data dictionaries and their design, to ar. 
illustration of how ar active data dictionary can he 
beneficially applied to DCSPLANS operations. 

Chapter Two of the thesis cites the essential role of 
data validation as an integral part of a data processing 
system. Validation criteria and techniques used in the 
'•data filter" are reviewed, and the general nature of edit 
and validation rules is introduced. 

Chapter Three explores the data dictionary. It includes 
some basic definitions and concepts, and specifically 
addresses how an active lata dictionary is used to validate 
data. 



Chapter Four outlines ar. approach to "local" initial 
design of a data dictionary "filter" system. This chapter 
also includes a recommended "checklist" of guestions a user 
group can ask to define its own data dictionary/validation 
requirements and system structure. 

Chapter Five specifically addresses the DCSPLANS 
situation. It cites a proposed goal and some key objectives 
of a DCSPLANS validation system, and uses a modified 
structure diagram of a "data filter" to illustrate the 
recommended approach to DCSPLANS* data validation dilemma. 

Chapter Six summarizes the results of this thesis. 



12 



II. INPUT VALIDATION 



A. GENERAL DESCRIPTION 

Inaccurate data items can easily find their way into 
master files and databases, either through direct incut by 
users or through improper processing actions b y application 
programs. Regardless of origin, inaccurate data are poison 
in any ADP system. Information created from inaccurate iatu 
also tends to be inaccurate, anc decisions based upon such 
information are counterproductive to organizational goals in 
almost every instance. Data is a valuable resource, and its 
accuracy is crucial to or janiza tional success. 

Validation is that set of actions which attempts to 
preclude the existence of inaccurate data within a r. 
information system. Validation tests can be implemented at 
any number of stages within the data processing cycle: 
prior tc input, upon input, during processing, and after 
processing (output checks). "Irput validation", as 
implemented by an active data dictionary system , occurs nt 
the second stage. 

Input validation focuses specifically on data being 
entered into a system. Its aim is to detect errors ana 
thereby insure the initial accuracy of the master file or 
database being constructed/upda te 1. [Ref. 1:p. 326 ] During 

input validation, checks are conducted to insure that the 
input/update operation itself is legal, and that input data 
does not violate prescribed accuracy constraints. Creation 
of a new file or the update of an existing one is a 
processing stage that demands extremely careful data 
validation, especially in those cases vhere th n input data 



is received from sources outside the control of the 
processing element. Fortunately, it is at this stage that 
the accuracy of data can be checked most erfectively 
[Eef. 2:p. 239]. One additional caution which must be 
mentioned at this point is that data does not become 
inaccurate from entry errors alcne. Data may be inaccurate 
simply because it is cld ! Previously accurate values may no 
longer be correct because available new values have not 
superseded older values due to neglected updates. 

Validation processes also must check for these types of 
inaccuracies. 



B. VALUATION TECHNIQUES 
1 . Category 

The general category of input validation techniques 
used by the "data filter" being proposed examines input data 
in the exact form in which it arrives for processing. The 
techniques involved detect errors by checking the 
"acceptability" of both the data transactions and the data 
itself. This checking is accom flislied through a series of 
programmed instr ucti cns/rules, and is implemented very 
effectively by an active data dictionary system. Three 
basic techniques are included in the cat.ejory: transaction 

validation, format checks, and reasonableness checks. a 
well designed validation progran includes a combination of 
all three. [Ref. 3:p. 248] 

The transaction validation technique is used to 
verify the legitimacy of transactions wnich input data. The 
format checks and reasonableness checks, on the other hand, 
are used tc examine the correctness of data items 
themselves. In order to facilitate a clearer picture of the 
"data filter" design which will be presented in the next two 



chapters, a brief description cf the three validation 
methods is provided telov. 

2 • Transac ti on Val idat ion 

Transaction valuation should be the first technique 
to be applied. It certifies that " a specific transaction 
is one that can be processed by the system and is being 
submitted properly." [Ref. 4:p. 218] Its focus is the 
verification that the type and purpose of the transaction 
are legitimate processing actions, and that the originator 
of the transaction has the authority to initiite it. 
Transactions determined to be iraccurate are rejected. 

Related validation c Kecks which also rust be 
conducted during this juncture cf the processing cycle ace 
checks for sequential dependencies and/or proper timing. 

For example, a Month ly_Report transaction may not he able to 
take place until Monthlv_Update transactions are 
successfully executed. 

The role of transaction validation as a "first step" 
stems from the potential damage which could be inflicted 
upon a system by the processing of an invalid transaction. 
Even if the invalid transaction is subsequently discovv?red, 
recovery may prove extremely difficult. An ounce of 
prevention, in this case, is certainly worth a pound of 
cure ! 

Cnee transaction validity is established, the input 
data itself is examined through a series of format checks 
and reasonableness checks. 



3. Format Checks 



to a 



Format checks compare the actual contents 
pre-set series cf user-defined rules. 



of a 



t 



fi el \ 
hose 



A record 



contents fail to conform to the prescribed format, either is 
rejected outright cr transferred to an appropriate error 
handling routine. Some of the more common format checks 
are: 

a) Length Checks: used to verify that a field contains a 

prescribed minimum, maximum, or fixed amount of 
characters. 

fc) Character Type Checks: used to verify that a field 

contains only specifically authorized value types, 
i.e. , numerics only, alphabetics only, blanks, or 
special characters. 

c) Character Pattern Checks: used to verify that the 

contents of a field match a prescribed pattern of 
alphabetics, numerics, dashes, etc. 

d) Date Checks: used to insure that the contents cf a 

date field are entered in the required, standard 
format, i.e., YYIiMDD or YYDDD. 

a . Reasonableness C hec ks 

F.easor.a bleness checks test data items to insure that 
data values fall within the limits of established 
constraints. These constraints are separated into three 
basic types. Field constraints limit the valie of a giver 
data item. Intrarecord constraints limit values between 
fields in the same record. Interrecord constraints limit 
values between fields in different records. [Ref. 5: p . 

179 ] Reasonableness checks based upon field constraints are 
fairly stra i jh tforwa rd in design and application. 

Intrarecorl and interrecord constraint checks, however, leal 
with logical accuracy and the interrelationships among data 
items. As such, they are much more difficult to develop and 
manage. Common reasonableness checks are: 

a) Field Constraints 



- Han je Checks - used to verify that the fie 11 value 
falls within a specified range, i.e., the value Ice 
not violate an upper or lower Unit. 

- Sequence Checks - used to test a specially created 
field to insure records are processed in the proper 
order. These checks are also used to verify the 
presence of all required records. 

- Completeness Checks - used to confirm that each 
mandatory field in a record is filled with a lata 
item of some prescribed size. 

- Date Checks - used to verify that the contents of a 
cate field dc not violate earliest or latest 
acceptable date restrictions. 

- Code Checks - used to verify that the contents cf a 
cole field are contained within a listing of valid 
and current cedes. 

b) Intrarecord and Interreccrd Constraints: 

- Completeness Checks - used to identify those fields 
in a record which must to filled basel upon the 
contents of ether fields in that record 

(intro. record ) or other records (interreccrd). 

- Consistency Checks - used to verify that the values 
in certain fields are valid in relation to the )ata 
values of otter fields (either in the same recce I o 
ether records). 

An example of an intrarecorl completeness ci.ecx is, 
"if the Conversion Indicator field in a record is filled, 
then the Conversion Cede field in that record must also be 
filled." An inter record version of a completeness check is 
as follows: "if the VRS Multiplier field is filled for any 

record in this run, then ail '/RE Multiplier fields must oe 
filled. " 

An example of an intrarecord consistency check is, 
"If the POS Code in a record is 63H, 
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then the grade value i 



the record oust be either Eh or E5. " An interrecord 
consistency check is "no SS N fieri value may be the sane as 
the SSN field value cf another iecord." 

It is also Possible to have "interfile" 
depen dencies, e. g. , a record with an 3SN field value of 
"9999939" in file "A" must have the same d 05 field value as 
a record in file "B" 'which has an identical SSN field value 
of "9999999." 

C. EDIT AND VALIDATION BOLES 

There must be an organized and consistent method fer 
applying the validation checks cited above to data being 
input into an information system. The vehicle for this 
application is the edit and validation rule (HVR) . EVRs nr: 
explicit statements of constraints about the iata in a 
system. These rules monitor the basic structure and 
relationships of data items, and enforce processing 
restrictions established by the information manager. 

[Ref. 6: p. 146] 

Two key issues concerning EVRs must be addressed when 
building a data validation system. The first is how to 
properly develop consistent rules. Consistent rules promote 
accurate data, whereas contradictory rules produce an 
unreliable lata system that eventually will crash. 

(Definition and development of EVl.s will be covered in 
chapter four as an integral part of the overall "data 
filter" design process) . 

The second key issue is where to place an EVE module, 
(i.c., is it better to embed it as part of an application 
program, or is it better to mike it a separate validation 
program?). The use of an active data dictionary as a "data 
filter" argues for the latter approach. The rationale for 
SHCH A PLACEMENT IS SET FORTH I t THE NEXT CHAPTER. 



III. DATA DICTIONARY AS "DATA FILTER" 



A. BASIC CONCEPTS 

Four basic concepts are central to a clear nndersta n Jin 
of how a data dictionary can be used locally to validate 
data maintained and provided by other sources. These are: 
Data Dictionary, Metadata, "Active" Data Dictionary, an 3 
Data Extraction. 



1 . Data Dictionary 

A data dictionary is a centralized repository of all 
definitive information about the relevant data in an 
enterprise. The data dictionary provides the user a 
description of what data exists, what it looks like, anl 
what it means. [Ref. 7:p. 1] A data dictionary can he as 

simple as a manual catalog system or as complex as an 
automate 3 set of programs which controls a wide range of the 
enterprise's data processing operations. 



2. Metadata 

The real world of an enterprise contains a number of 
data objects (entities) which are represented in the 
enterprise's information system as data elements, recorJs 
and files. For example, customers (entity) are represented 
by a set of data elements/! ield £ (CU3T_ID, CU3T_h’AMF, etc.) 
which comprise records (CIJST_3EC) , which, in turn, are 
grouped into files (CC3?_FILE) . The data used to define and 
describe these entities are called metadata, i.e., data 
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about the data. Metadata are stored in the data dictionary, 
forming a metadata database or metadatabas e. [Bef. 8:p. 9] 
Dictionary metadata contain the characteristics of each data 
object. The metadata answer the following questions: 

a) 5* hat data is available in the enterprise? 

b) What does the data mean? 

c) How is the data structured? 

d) What constraints and relationships exist? Typically, 

dictionary metadata include: object name, short name, 

synonym or aliases, source, narrative description, 

recor ds/f ilos that use cr contain the lata object, data 
structure/forroa t, integrity constraints (a.g., value range), 
and r ela tionships/de pendenc it s. [Ref. 9 : p . 18] Metadata arc 

essential ingredients in the validation of data by a data 
dictionary system. 



3. "A ct ive" Data Dictionary 

There are two basic modes in which a lata dictionary 
can function: passive or active. A passive data dictionary 

merely registers the metadata and provides the user a 
facility for interactive ^uerv and/or report generation. It 
does net require that lata processing operations depend upon 
it for metadata, and no direct link is maintained between 
the passive data dictionary and other system components. 

(See Figure 3.1) In fact, application programs and 
processes may obtain their metadata entirely from other 
so urces . 

An active data dictionary, on the other hand, 
exercises a great deal of control over processing and 
metadata usage within an information system. A data 
dictionary is said to be active with respect to an 
information system, if, and only if, t.uat system is 
dependent upon the data dictionary for its metadata. (See 
Figure 3.2) 
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Figure 3.1 Passive Dictionary 

A lict ionarv is active to a lesser decree when caiy 
some of the system's programs and processes are lerenden*- 
upon it for metadata. The more programs or processes that 
rely on the dictionary, the more active it is said fc he. 
[Hef. 10:p. 22] The value of an active data dictionary stems 
from the establishment of mandatory interfaces between it 
and various system processes. then the data dictionary is 
used as a "data filter", these a an la tor y interfaces will 
insure that input data conform to pre-defined rules and 
standards. 
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metadata and all pro jrams used by the data dictionary 



Figure 3.2 Active Dictionary 

4 . Data H x t ract icn 

Data extraction is a technique whereby a subset of 
data frcm a very large file system or database is transferal 
to a much smaller "extracted" file or iatabase. The data 
extraction process car, be either guito simple or very 
complex. A complex lata extraction process is designed to 
collect, format, ana integrate cata from a number of source 
fi les/d ata bases into a single data source whose contents are 
specifically tailored to the needs of a single user or group 
of users. Such a system involves extensive dat \ 



description, subsetting, ajgrejation, anl presentation 
operations. [Ref. 11:p. 245 ] This thesis addresses data 
extraction from a much simpler perspective, i.e., as a rear, z 
to limit the size of the data tc be validated by the da^a 
dictionary. In most cases, user applications do not need 
all data contained in a large data source. Thus, the 
extraction of only pertinent data (a much scalier subset), 
usually servos to increase the speed of application programs 
acting upon the data. Such data extraction operations car. 
be used to greatly enhance the efficiency of the propose 1 
"data filter" when large source files arc involved. A 
diagram of a simple data extraction design which can be used 
in conjunction with a data dictionary "filter" is shown in 
Figure 3.3. 

Throughout the remainder of this thesis, the term 
"data filter" will reler to the active data dictionary 
validation system being proposed. 



B. CONFIGURATION 

1 . Metadata Generation 

The key to constructing the lata filter is 
incorporating into a data dictionary the capability to 
generate the metadata needed Lv a system's edit anl 
validation software. The metadata generation is triggered 
by the edit and validation software throujh the issuance of 
commands and applicable parameters. The data filter must be 
designee so that the edit and validation, with its mandatory 
call for metadata generation, is automatically activated 
during all data input operations. The resulting metadata 
generation produces data descr i ).tions based upon the 
characteristics stored in the data dictionary me tad a t i base. 
These data descriptions are transformed into specific edit 
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Figure 3-3 Data Extraction Design 

and validation rules (EVP) for use by the edit and 
validation programs. [ Hef. 12: p. 116] 



2. Edit and Valj.dat.ion Pro crams 

Edit and validation programs are separate from the 
application programs which enter the data into the system. 
They cannot be executed without lata dictionary metadata (in 
the form of EVE) through which they will filter all incoming 
data. These programs are usually general purpose in nature. 
The tailoring of the programs to specific types of data is 
accomplished through the EVE provided by the active data 
dictionary. For example, an ZME data entry operation wiil 
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result in different EVR being passed to ari e lit and 
validation program than will a E'iAD data entc/ ( ? M A I> dati 
may be composed of totally dissimilar data objects than ERF 
data, and may also involve very different validation 
criteria). Various edit and validation programs can be 
incorporated into the data filter to accommodate listinct 
categories of data entry operations, e.g., updates, 
deletions, creation cf new files, etc. 

3. General J2§§ij£ 

Figure 3.4 depicts a generalized data filter design. 
The data dictionary generates metadata based upon commands 
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Figure 3.4 General Eata Filter Design 
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from the edit and validation 'rcgrai. Then, the metadata is 
transformed into EVR which are fed back, into the edit and 
validation program. The edit ard validation program 
"filters" incoming data through the EVP during the edit and 
validation process. "Correct" data is moved to the 
appropriate storage area, and erroneous data is either 
rejected outright or sent to an error file for future 
editing and resubmission. 



I — 



SOURCE 
D AT A FILE 



T 

I 

v 



ra v 
da ta 



r 



TTTI- - ' 

DATA 



7 

I 



f a e t a - 
| data 
V 



T BT7I 1 




EXTRACTION 




JiiTA T 




PROGRAM 




CICTIONAP.Y | > 


Eva 






1 < — 






T 



da ta 



"EAT AFFILE 



new T 

>1 £CI7 /VALIDATION 

•data | EROGRAd (S) 






| valid 
| data 
V 



'3T?rTC7vTIC!7' 

PROGEAd (5) 



T 

I <- 

I 

I 



VR 



Figure 3.5 The Data Filter System 

Figure 3.5 shews the ecnplete data filter system 
with a data extraction module added. Inis configuration 
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increases data validation efficiency by reducing the amount 
of data to be "filtered." In the DC 5? LA MS ' case, due to the 
enormity of the EMF and some other source cati files, the 
time saved becomes quite significant. 



C- ADVANTAGES 

Almost all data editing and validation systems provide 
the user a capabilty to validate and edit data, and tc 
correct and report erroneous lata. There are, however, 
added benefits to be gained by using the active data 
dictionary approach which forms the basis of the data filter 
configuration described above. 

First, since the active data dictionary becomes the 
sole source of metadata for all edit and validation 
processes, redundant metadata is eliminated and metadata 
consistency is promoted. In essence, a much greater degree 
of control over metadata is realized, and, as a result, 
regulated, consistent validation of lata is achieved. 

Second, the data dictionary afforls the user a very 
flexible and easily adjustable validation mechanism. 

Changes in ’lata and revisions tc validation criteria dc r.ct 
require modification cf application programs or edit and 
validation programs. Instead, changes are easily 
accommodated by simple adjustments to me tadata/Z VF . 

Third, should the information system involve! to 
file-iased (as is the case with DCSPLANS) , the lata 
dictionary approach is an invaluable "bridge" for a rut are 
transition to a database system. Ease of transition is 
promoted by already having in existence an organized, 
centralized store of the enterprise's metadaM. 

One ether benefit of t.ho proposed data filter system 
stems from the separation of *• h e data extraction program 
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from the actual edit and validation activities. Met 
overall validation speed increased, but also the use 
has the option, in exigent circumstances, to forego 
validation entirely if time constraints demand such 
An interdependent ex tract ion/v a lidat ion process voui 
allow this alternative. 
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IV. PLANNING AND GENERAL DESIGN 



A. KEY DEVELOPMENT PHASES 

A software product’s ability to do what it is supposed 
to do efficiently is largely governed by the quality of t..e 
detailed design and ceding that creates it. Tn turn, 
successful detailed design ar.d coding are directly tied tc 
the quality of initial planning and design activities. 

Thus, the planning and preliminary design steps taken by 
users to develop a local data filter are crucial, and rust 
be comprehensively and carefully accomplished. 

Planning and initial design of a data filter is a three 
phased process. Phase one describes the system's 
environment and general charact er ist ics. Phase two develop 
data definitions and validation criteria. Phase three 
produces an initial logical design of the system. A 
description of each of these phases is presented below, 
along with a "checklist" of relevant questions which serves 
as a guide for proceeding through the phase. 

The checklists fern a framework within which 
users/developers can methodically develop the data filter. 
The framework assists then ir. : 

1. Obtaining a clear, comprehensive picture of the 
environnent in which the data filter will function. 

2. Identifying and defining the dati to be validated, 
and determining the nature and scope of validation 
teg uired. 

3. Constructing well-defined, functionally structured 
validation and EVE modules. 



E. 



PHASE ONE 



SYSTEM ENVIRON!! ENT/GENERAL CHARACTERISTICS 



1 . De sc ri ption 

This phase identifies ail hardware and firmware 
being used (or projected for use) in the overall information 
system, and describes its environment (e.g., distributed vs. 
centralized system, file system vs. database system, etc.). 
It notes validation capabilities already built into the 
system, and also identifies commercial validation 
capabilities which are compatible with existing hardware and 
firmware . 

Phase one also uncovers the general nature of the 
input data to be validated. It identifies the broad 
categories of input data, examines data stability and 
consistency, and looks at who exercises control over the 
entry of data into the system. This phase outlines data 
entry methods and notes the various processing stages at 
which data validation may occur (pre-input, during input, 
etc.). An overview of system output is also formulated. 

The level of accuracy required for the output is 
established, and the degree to which output validity is 
dependent upon valid input is determined. 

2 . Chec kl ist 

Answers to the following gaestions will provide a 
clear picture of the overall system, including inputs ar.d 
outputs: 

a) What major hardware components comprise the system? 

b) What operating system is used? 

c) What validation capabilities are already built into 
the system hardware/firmware? 

d) Are there currently any plans to ch a r.ge/ex pa ni major 
system hardware? 
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e) Are any sy ste ir-compa tible data validation pro 1 acts 
currently available (either in-house or co amerc ra lly) ? 

f) What system-compatible data dictionary software is 
currently available (either in-house or commercially) 7 

j) Are we dealing with a file-based or database system? 

h) what portions of the information system are 
distributed? 

i) Hew stable are system inputs? (i.e., Are different 
data elements, records anc files added or deleted or. a 
frequent basis?) 

j) Are data definitions and parameters changed 
frequently? 

k) Are we dealing with a stable number of data elements 

which will retain stable attrioutes? 

l) Is input processed in a batch moio, on-line, or both? 

m) Is any pre-input validation conducted? Describe! 

n) Is any output validation conducted? Describe! 

o) What are the sources of input data? Identify ail 
input files and the applications for which they 
provide data. 

p) What degree of control over the entry and update cf 
input data is exercised b\ system users 7 

p) 7 ren what locations, and by whom, can lata be added, 
changed or deleted. 

r) what sources beyond the user's control provide inpit 
data? Identify the data provided by each of these 
outside sources. 

s) Hew often is data entered? Updated? 

t) How is the processed data being used ? (A general 
description, e.g., report generation, modeling, etc.) 

u) For each application, repert, etc., how critical is 
validity 7 (i.e., What ire the consequences of 
inaccurate outputs?) 
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C. PHASE TWO - DATA DEFINITION /VALIDATION CHITEEIA 

1 . De scr iption 

This phase ilentifies a rd defines the system's data 
entities. For the purpose of the data filter, data entities 
include all data elements entered into the system and the 
records and files which contain them. The applications 
which use/process these entities are also established. 

Phase two also sets forth all validation checks 
required. Data element characteristics such as description, 
range, type, size, sequence, etc. are recorded, and all 
entity relationships are carefully delineated. The 
information developed during this phase forms the data 
dictionary metadatabase, and is used to construct the 
system’s EVP and validation program modules. 



2 . Checklist 

Answers to the questions listed below will enable 
the user/developer to identify, describe, and determine the 
interrelationships of all system entities. He will also be 
able to establish validation criteria for each entity and 
cross-ref erence them to the applications which require that 
such validation occur. 

a) What data elements does the system contain? 

b) What record (s) contain these data elements? 

c) What tile (s) certain these records? 

d) For each application (model): 

- Which files feed it data? Which records? 

- Which data elements does it use/process? 

- Which data elements must be validated (i.e., dees 
the validity cf the application's output depend on 
this input data element being valid)? 

- Is a specific sequence cf lata entry required? 
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- What pre-entry upda t ts/ tr ansacti on s mast occur, if 
any ? 

e) For each data element: 

- What is its name? Any Synonyms or aliases? 

- What is its Short Name/ trogramminy Name? 

- What is its IE*? 

- What is its character type (alpha, numeric, etc.)? 

- What minimum and maximuii number of characters are 
ailowed?d 

- What numeric value range applies? 

- what character pattern is used (e. j. , CCC-NNN-CC) ? 

- is there a mir.imua/maxi Hum range of ailovatle chang 
from one update to the next? 

- What cause and effect relationships exist with cthe 
data elements? In the same record/file, in ether 
records/f i les? (c.g., If "A” is changed, then "3" 
must be chanced) . 

- Is a particular update sequence required? 

- Do date fields have any earliest or latest date 
limits? 

- Do date fields require a special format (e.g. 

YY .'1 FDD) ? 

- What direct relationshi £S exist with other data 
items? (e.g., value of "A" must always Le twice 
that o f "B") . 

- Is the data element a cede or a value that be 
checked against a table or listing of valid codes o 
va 1 u es ? 

D. PHASE THREE - INITIAL LOGICAL DESIGN 
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1 . Description 

Phase three produces a model of the logical 
structure of the data filter system which later will he 
’’built" (during coding and testing). Since it fonts the 
basis for ail further design steps and refinements, this 
preliminary logical design is the key step in the data 
filter design process. The data filter structure developed 
during this phase is based upon the general filter design 
cited in chapter three and the system environment and 
data/validation information gathered during phases one and 
two. 

Phase three gives the user a description of the data 
filter system goal and objectives, and presents the major 
system functions. These major functions are then decomposed 
into sub-functions until a series of sinjie, independent 
modules have been identified. ihis overall system 
architecture is depicted in a hierarchical structure diagram 
(See Figure 4.1) accompanied by narrative descriptions of 
the modules. 



2 . Checklist 

Answers to the following questions will enable the 
user /developer to produce the information described above: 

a) What is the goal of the system? (State the general 
long-term desired effect). 

b) What are the system's key objectives? (Enumerate tbe 
critical milestones to be acconplisued to satisfy the 
stated system goal). 

c) What are the system's major functions? (List the 

general processing activities required to meet system 
objectives). For example, a bank's checking account 
system may have four major system functions: (1) 
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Figure 4.1 Structure Diagram 

performing account a 1 mini rtra tion (open accts., close 
accts, etc.) (2) processing deposits, (1) processing 
withdrawals, (4) maintaining an account transaction 
database. 

1) What rtolules (sub-functions) comprise oich or the 

system's major functions? (Limit to no more than 3-5 
modules per function, and repeat the process level by 



level until nc further irccuie decomposition is 
necessary, i.e., simple, independent modules have been 
cr ea t ed) . 

e) What does each system module do? (Give a precise, 

concise description of approximately two sentences). 

3 . Follow -on Design 

Cnee the above pdiases have been completed and 
carefully documented, the data filter structure has been 
tailored to the user's specific environment and validation 
needs. Subsequent development involving detailed design 
(data flows, data stores, interfaces, etc.), coding, 
testing, etc. can follow using cne of a number of applicable 
methodologies which currently exist. 



V. THE DCSPLANS "DA 1A FILTER" SYSTEM 



This chapter specifically addresses the DCSPLANS' "data 
filter" system. It provides a statement of the system's 
overall goal and its key objectives. It also expands the 
general data filter design provided in chapter three into a 
more detailed hierarchical design structure tailored to the 
DCSPLANS situation. 

A. DCSPLANS SYSTEM GCAL AND OBJECTIVES 

A number of DCSPLANS' ur.ijue operational characteristics 
must le considered when formula tin; the system's goal an : 
its key objectives. These critical aspects are uncovered 
during Phases I and II of the preliminary development 
activity (presented in the previous chapter), and are used 
to create the Phase III deliverables illustrated in this 
chapter (System Goai/Cbjectives and Structure Diagram with 
Narratives). A sample of the DCSPLANS cha racter ist ics 
having the greatest ispact on the general system design are 
presented below. 

The most important fact is that DCSPLANS personnel have 
little faith in the accuracy of input, data they are 
receiving from a variety of verj Large source files prepare: 
and maintained by elements outside their span of control. 

At the present time, DCSPLANS ices not possess the 
capability to validate this questionable input data. They 
are, however, extremely worried about the adverse impact of 
such in r ut data on the validity of model outputs. 

Input source files provide crucial data to DCSPLANS' 
force alignment models. Each of the files feeds a varying 
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number of models, and supplies a unique set or lata elements 
depending on the particular model involved. Generally, the 
data elements contained in the source files an 1 the data 
elements required by the models remain the same, creating 
relatively good system st.abilty in this regarl. There are, 
however, occasional changes made in the data elements 
provided or required. A DCSPLANS validation tool must 
provide the flexibity to incorporate such changes easily. 

In many cases, aclols using the same data elements from 
the same source file require different degrees of validation 
(e.g., the validity of input lata element "A" from the 
Enlisted Master Pile may be crucial to the validity of 
Personnel Readiness Indicator Model output, bat 
inconsequential to the validity of output pro luce 1 by the 
Personnel Policy Projection Model (?3M) ) . Thus, a DC 3 PLANS 
validation tool must be able to dif f erentiate between the 
validation required for Enlistee Master File data when used 
Ly the Personnel Readiness Indicator Model as opposed to the 
P3M , and it must apply edit and validation rules 
accord i ngly . 

Generally, DCSPLANS' models are run on a standard 
schedule which coincides with required briefings/reports and 
which also facilitates use of ore model's output as input 
for another model. There are, however, occasions when a 
model's output is required on very short notice. In these 
circumstances, the time normally devoted to lata validation 
may not be available, and the DCSPLANS' models would have to 
be run in the quickest possible time without regard to data 
integrity. While such a procedure seems unwise, it may 
occur, and the DCSPLANS validation to >1 must provide for 
such a contingency by allowing itself to be circumvented If 
required. In this regard, the CCSPLANS data filter cannot 
be a mandatory part of any integral data extraction or 
modeling process. 
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The majority of ECSPLAMS mocelir. 3 activities vill he 
done in a batch node. The extraction of pertinent data from 
large input files is also a batch process ( e the 
"UTP.ACS" program developed and used by DC3PLAU3 to extract 
pertinent data from the Enlisted Master rile). However, 
capabilities to manipulate data dictionary metadata on-line 
and to query the metauatabase OE-line are cruciil to 
effective, user- c r ie nd ly operation of the data filter 
system. All other data filter processes (e.g., EVE 
formulation) will be done in batch mole to insure run-time 
e£ ficier.cy . 

Based upon an examination of the overall DCS?LA’!5 
situation, and keying on the points just mentioned, the jeal 
of the DC3P1AN3 data filter system is to validate all 
externally provided input data use 1 by DCSPLA'IS' force 
alignment models in consonance with established DCSPiAhS 
quality control standaris. 

Key objectives of the DC3PLANS data filter system are: 

1. It mast be compatible with the existinj DC52LANS 
computer system cor.f iguration. 

2. It must allow flexible and easy additions and up'dates 
to the metadatabase. 

3. Its interface with the data extraction and mo del in.’ 
processes must be optional (at the discretion of tne 
Chief, DC3PL A NS ; otherwise it will be m automatic, 
mandatory interface) . 

4. It must t-rovide for the automatic adjustment of edit 
and validation rules to suit the particular source 
file and model being processed. 

5 . It must provide an interactive on-line query facility 
for accessing the netada tahase. 

6 . It must provide an error /status report jc 
facility. 

7 . Jt must be a user- frier. 3 iy system. 
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8. System development and i aple menta ti on costs must be 
consistent with the "local" nature of the system. A 
conservative approach is desired. 



B. "DATA FILTER" STSDCTD3E 

This section uses a structure diagram (in modified 
format) to set forth the proposed structure of the' DCSPLAiJS 
"data filter" system software. The structure is derived 
from a functional decomposition process in which major 
system functions are split successively into sets of 
sub- f unctions. The proposed DCSPLANS system will be 
decomposed to three levels. This decomposition demonstrates 
the hierarchical control structure and relationships of 
modules which comprise the overall "data filter" program. 

It does not represent any particular processing seguer.cc or 
crier of decision-making. [Ref. 13: p. 149] 

~he structure diagram is normally presented in the 
graphical format shown in Figure 4.1. However, due tc the 
crowding effect that will occur from a three-level 
decomposition, the major system functions (level 1) and 
subordinate modules (levels 2 and 3) are represented hero in 
paragraph/sub-paragr aph format (See Figure 5.1). Modules 
depicted in this manner are easily transferred to a graphic 
representation of the overall system, if required. 

1 . Struct are Dia gra m 

The proposed data filter system contains five major 
functions (Control Data Filter Systen, Maintain 
Hetad atabase. Produce EVR, Validate Input Dat i, Generate 
Reports). The system's hierarchical structure is 
illustrated below, followed by descriptions of each major 
function, suh-f unction, and lower level module. 
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DCS? LAK 3 lata Filter 

1.0 FIFST MAJOR FUNCTI 0 M (Level 1) 

1.1 First Sut-fur.ctior. cf 1.0 (Level 2) 

1.1.1 First Module of 1.1 (Level 3) 

1.1.2 Second Module cf 1.1 (Level 3) 

1.1.3 Third Module of 1.1 (Level 3) 

1.2 Second Sub-function of 1.0 (Level 2) 

1.2.1 First Moiule of 1.2 (Level 3) 

2.0 SECOND MA JO B FUNCTION (Level 1) 

2.1 First. Sub-function cf 2.0 (Level 2) 

.1.1 First Module of 2.1 (Level 3) 
.1.2 Second Module cf 2.1 (Level 3) 

2.2 Second Sut-function of 2.3 (Level 2) 

2.3 Third S ub -f ur.ct ion c£ 2.0 (Level 2) 

2.3.1 First Module of 2.3 (level 3) 

(ETC. ) 



Figure 5. 1 Sample Paragraph Format 

1.0 CONTROL DATA FILTER SYSTEM 

1.1 Verify Transaction Validity 

1.1.1 Read Access and Transaction. Cedes 

1.1.2 Evaluate Codes 

1.1.3 Implement Validity Decision. 

1.2 Provide Menu/Screen 

1.2.1 Read Validity Decision 

1.2.2 Display Appropriate Screen 

1.3 Transfer Control 

1.3.1 Real Screen Input 

1.3.2 Derernine Proper Process 

1.3.1 Pass Frojrt.il Control 

2.0 MAINTAIN METADATA EASE 



U 1 






1 Control 



2.1.1 Provide M etada ta base Menu 

2.1.2 Transfer Control 



2.2 Ad! Metadata 



2.2. 1 


Read Add Data 


2.2. 2 


Check Uniqueness 


2.2. 3 


Check Format 


2.2.4 


Accept Data 


2.3 D elete 


Meta c ata 


2. 3. 1 


Read Celete Request 


2. 3. 2 


Locate Metadata 


2.3.3 


Remove Metadata 


2.4 C hange 


Metad ata 


2.4. 1 


Read Change Request 


2.4.2 


Locate Metadata 


2.4. 3 


Update Meta lata 


3.0 PRODUCE EVR 





3. 1 Ccntrol 

3.2 Retrieve .Metadata 



3.2.1 


Read Source File/Model Codes 


3.2.2 


Open Metadata Fiie(s) 


3.2.3 


Extract Pertinent Data Values 



3.3 Formulate EVE 

3.3.1 Load Variables 

3.3.2 Set Switches 
4.0 VALIDATE INPUT I A TA 

4.1 Ccntrol 



4.2 Select 


EVE 


4. 2. 1 


Detemine Input Record Types 


4 . 2. 2 


Extract Applicable EVP 



4.3 Apply EVR 



4. 3. 1 


Read Input Data 


4. 3. 2 


Read EVE 


4.3. 3 


Check Parameters 



4.4 Provide Processed Input Data 
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4.4.1 Rend Error Cole 

4.4.2 Transfer Erroneous Cat l/Error Dole 

4.4.3 Transfer Valia Data 
4.5 Maintain Statistics 

4.5.1 Maintain Transaction Soar, t 

4.5.2 Maintain Error Count 

4.5.3 Sort Error Types 
5.0 GENERATE REPORTS 

5. 1 Control 

5. 2 Retrieve Pepcr t/Response Oita 

5.2.1 Determine Eeoort/Res jonse Tyre 

5.2.2 Read Applicaolc Data 

5.3 Perform Calculations 

5.4 Provide R epo rt/P.esponse 

5.4.1 Determine Format 

5.4.2 Form a t Da ta 

5.4.3 Transfer to Output Device 



2. Narrative De sc ri nti o ns 

The following are succinct explanations of the key 
aspects of each structure diagram function, sub- f unc tion, 
and module. Each lower level description serves to 
ref ine/expanl the detail of its superior level. 

- 1.0 CONTROL DATA FILTER SYSTEM: This functior controls 

access to the data filter system and verifies 
transaction validity. It also provides screens for 
implementing other major system functions, and 
transfers control to these processes. 

- 1.1 VERIFY TRANSACTION VALIDITY: This sub- f u nc t io 

insures that the user is authorized access to the 
systeai for the desired transaction, and th it the 
transaction itself is valid (e.j., an attempt to 
validate the Enlisted Management File for use in the 
Officer Promotion Model would be rejected). 
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1.1.1 READ ACCESS A II D TRANSACTION CODES: This module 

reads in the user's access cole and the transaction 
codes indicating the desired process and the source 
input f ile/mode 1 (s) involved. 

1.1.2 EVALUATE CODES: This nodule checks user-supplied 

codes against authorized access and transaction cedes. 

1.1.3 IMPLEMENT VALIDITY DECISION: This nodule ’will 

either reject the transaction or pass an indication of 
a valid transaction to nodule 1.2. 1. This module also 
sets restrictions within authorized processes (e.g., a 
user may be allowed to add metadata, but not change cr 
delete existing metadata) . 

1.2 .PROVIDE M ENU/SCE ESN : This sub- function provides 

the user with the appropriate screen for continued use 
of the system. 

1.2.1 READ VALIDITY DECISION: This module reads the 

validity indicator produced by module 1.1.3. 

1.2.2 DISPLAY APPROPRIATE SCREEN: This module causes 

either a menu or screen, as appropriate, to appear on 
the monitor. 

1.3 TRANSFER CONTROL: This sub-function passes control 

tc an appropriate system module in response to user 
input. 

1.3.1 READ SCREEN INPUT: This module reads user 

responses to terminal prompts. 

1.3.2 DE m ERMINE PROPER PROCESS: This module interprets 

user input in terms of the desirel system function 
(e.g., update metadata, generate report, etc.). 

1.3.3 PASS PROGRAM CONTROL: This module passes control 

to the appropriate system nodule. 

2.0 MAINTAIN MET ADA? A3 ASE : This function creates new 

metadatabase entries, deletes me ta ia tab use contents, 
and makes changes to the existing metadatabase. 



2.1 CONTROL: This sub- f un ct ior. displays tic 

metadatabase menu, and governs the activitioi: ar. i 
sequence of add, chunje and delete processes. 

2.1.1 PROVIDE METADATA BASE MENU: This module displays 

a menu jiving the user options of adding, leleting or 
changing metadata. 

2.1.2 TF.AN5FBF CONTROL: This aodile t asses control to 

either modules 2.2, 2.3, or 2.4, depending on user's 
request and access authorization. 

2.2 ADD METADATA: This sub-function reads metadata 

input, checks it. for duplication and proper entry 
format, and either rejects the input or stores it in 
the metadata base. 

2.2.1 READ ADD DATA: This module reads dat a which t 5 

user desires to enter ir.tc the metadatabase. 

2.2.2 CHECK UNIQUENESS: This module checks 

metadatabase to insure data to be added does not 
already reside there. 

2.2.3 CHECK FORMAT: This nodule checks data to be 

added for compliance with described standard metadata 
entry formats. 

2.2.4 ACC HP ^ DATA: This mcdule evaluates results cf 

module 2.2.2 and 2.2.3 processing, and either rejects 
data to be added or stores it in the- me tad \ tab use . 

2.3 DELETE METADATA: This sub-function rei.ls metadata 

deletion request, locates the data in the :n c t uca t i 1 i se , 
and removes it. 

2.3.1 READ DELETE FZQUS5 2 : This module reads the 

user's request tc delete data. 

2.3.2 LOCATE METADATA: ''his module locates indicated 

metadata in the metadatabase. 

2.3.3 FEMOVZ METADATA: This module removes metadata 

from the metadatabase after a re- ve ri f ica t i or. cf the 
user's desire to delete the data. 



-2.4 CHANGE METADATA: This sub-function reals a 

metadata change reguest, locates the data to be 
changed, and updates the data after verification that 
the new metadata meets the prescribed entry format. 

- 2.4.1 READ CHANGE REQUEST: This module reads the 

user's request tc update existing metadata. 

- 2.4.2 LOCATE METADATA: This module locates the 

metadata to be changed. 

- 2.4.3 UPDATE METADATA: This module replaces old 

metadata with new metadata. 

- 3.0 PRODUCE EYR: This function produces edit and 

validation rules for use by sub-function 4.3. Metadata 
values are extracted from the metadatabase and are 
transformed into bounded conditional statements through 
which input data will be run. 

- 2.1 CONTROL: This sub-function governs the activation 

and sequence of processes involved with the production 
of edit and validation rules. 

- 3.2 ACCEPT PROCESSING CODES: This sub-function reads 

the source file and model codes entered by the user, 
opens appropriate metadata files, extracts applicaahie 
metadata values, and stores them in a "variables" file. 

- 2.2.1 READ SOURCE FILE/MODEL CODES: This module reads 

the source file and model identification codes entered 
earlier by the user. 

- 3.2.2 OPEN METADATA FILE (5): This module identifies 

and opens all metadata files containing lata relating 
to source file and models noted by module 3.2.1. 

- 3.2.3 EXTRACT PERTINENT DATA VALUES: This module 

extracts pertinent metadata values from opened 

me tadatafcase files and stores the data in a "variables" 
file. 

- 3.3 FCFMULATE EVF: This sub-function reads the 

metadata values stored in the "variables" file into a 
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file of pre-estatlishe J conditional sta tcnonts, thereby 
setting switches either on or off a n ,d setting upper an'i 
lover boundaries or acceptable input lata values. 
(Setting an 3 boundaries will thererore vary accord ir.j 
to the combination of source rile and model coles 
presented by the user.) 

3.3.1 LOAD VAEIAELE3: This module reads the 

"variables" file into a file of pre-set conditional 
sta tement s. 

3.3.2 SET SWITCHES: This nodule, depending on variable 

values, sets swithches either on or off and establishes 
upper and lower boundaries, as required. 

4.0 VALIDATE INPUT DATA: This function ictually 

performs the validation by selecting specific EVP, 
applying these £7R to the input data, and providing the 
processed input data to either a "validated data" file 
or an "error" file. This function also maintains 
statistics on the number of data items processed and 
the number and category of errors found. 

4.1 CONTROL: This sub-function governs the activation 

and sequence of processes involved in the actual 
validation of input data. 

4.2 SELECT EVR: This sub- function identifies the ty t e 

cf record (s) being validated iron the source file, and 
activates only those EVR which apply. (This 
sub-function precludes the validation program from 
unnecessarily running an input record past all source 
file EVR, thereby enhancing run-time efficiency of tie 
overall process.) 

4.2.1 DETERMINE INPUT HECCFD T Y _> E 3 : This module 

identifies the subset of records tnat are being 
validated from the source input file. 

4.2.2 EXTRACT APPLICABLE EVP: "'his module extracts 

only those EVR applicable to the record types being 
v a lida ted . 



4.3 APPLY EVR: This sub- function reads the input data 

and its associated EVR, and compares them to verify 
co ipliunce. 

4.3.1 READ INPUT DATA: This module sequentially reads 

input data to he validated. 

4.3.2 HEAD EVR: This modulo reads EVR from module 

4.2.2. 

4.3.3 CHECK PARAMETERS: This nodule compares in[ut 

data to EVR parameters, assigning an appropriate error 
code (including "no error"). 

4.4 PROVIDE PROCESSED INPUT DATA: This sub-f unction 

reads the processed data and its error code, ar. 1 
transfers the data accordingly. 

4.4.1 READ ERROR CODE: This module reads the data and 

associated error code from module 4.3.3. 

4.4.2 TRANSFER ERRONEOUS DATA/ERROF CODE: This module 

transfers erroneous data with its associated error cede 
tc an "error" file. 

4.4.3 TRANSFER VALID DATA: This module transfers all 

valid input data to a "validated lata" file. 

4.5 MAINTAIN STATISTICS: This sub-function maintains a 

running count cf the number of transacti >ns processed 
and the number and type of errors found. 

4.5.1 MAINTAIN TRANSACTION COUNT: This module 

maintains a running count cf the number of transactions 
processed in a validation activity. 

4.5.2 MAINTAIN ERROR COUNT: This module counts the 

number of errors found and notes the error cole 
involved. 

4.5.3 SORT ERROR TYPES: This module sorts a validation 

activity’s error count by type of error. 

5.0 GENERATE REPORTS: This function accepts requests 

for both printed reports and interactive (terminal) 
responses, determines and retrieves the appropriate 
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rei or t/response data, per f errs c 
formatting as required, anc issues the requested 
rep cr t/response . 

5.1 CONTROL: This sub- fu r. ct ion governs the activation 

and sequence of processes involved with the production 
of prii. ted reports and interactive response to terminal 
queries. 

5.2 RETRIEVE REPORT /RESPONSE DATA: This su b- f u r.ct ion 

determines the type of repert/response desired ana 
reads required data from appropriate files. 

5.2.1 DETERMINE FEPOET/EES EONSE TYPE: This module 

interprets the user reguest for information in terms of 
repert/response content. 

5.2.2 READ APPLICABLE DATA: This nodule locates, reads 

and temporarily stores the data needed for the 
requested report/response. 

5.3 PERFORM CALCULATIONS: This sub-f unction determines 

whether calculations are required to produce desired 
information, and if so, it reads the appropriate data 
and performs the required operations, pro lacing "new" 
repert/reponse data. 

5.4 PROVIDE REPORT: This suL-function determines the 

appropriate repert/response format, formats the 1 i ta 
accordingly, and transfer the formdt f ed data to the 
appropriate output device. 

5.4.1 DETERMINE REPO IT EG F EAT : This so 1 i le determines 

the format require! for the desire! response in 
accordance with pre-establ ished fora’ at parameters. 

5.4.2 FORMAT DATA: This mclaie arranges lata in proper 

format. 

5.4.3 'TRANSFER TC OUTPUT DEVICE: This module transfers 

the formatted data to the appropriate output device. 



C. "DATA FILTER" IMPLEMENTATION 

T wo key advantages inherent in the proposed local data 
validation system concept are lev? Jeveiop lent costs and 
speedy i zrp lenent atio n. In this light, initial DC5PLANS 
develcpir,ent efforts irust focus cn the creation of a 
prototype system that takes maximum advantage of existing 
resources. Specifically, the DCSPLANS prototype must 
incorporate the existing UTS ACS program which extracts 
relevant Enlisted Master File { EM F) lata, the existing E3ASE 
II data dictionary which currently includes general model 
and office metadata in its meta da tabase, and the existing 
DCSPLANS IBM PC microcomputer. The DCSPLANS local data 
filter system therefore will ccr.sist of an IBS PC based, 
DBASE II program which filters EMF input data for use in two 
application models (two models trust, be used to tost the 
system's ability to differentiate between the degrees cf 
validation required by separate models using the same input 
data source file) . 

The following steps suggest a methodology for 
development of the .initial DCSPIANS prototype "data filter" 
system. 

1. Determine and implement the proper interface 
mechanism for feeding CJTFACS extracted I'd" 7 data 
through the IBM PC data filter system. 

2. Expand current data dictionary capabilities by 
creating additional meta database modules which will 
accept and stone metadata aoout source file and model 
data elements. Create ar adutional data dictionary 
metadatabase module that will accept and store EVE. 

3. Usin j the Phase IT. checklist from chapter four, 
comprehensively construct lata definitions for EMF 
and model data elements, and create the EVP metadata 
which sets data element validation parameters and 
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interrelationships. This step mast be accomp lisho . 
with the full, constant cooperation of those DC5TI. a!! 
personnel most closely acquainted with the £:■*.? ar.i 
the two application models being used for *he 
prototype. 

4. Load the data definition ana TVR metadata into the 
data dictionary metadata base. 

5 . Using the functional modules froQ section 3 of this 
chapter as a guide (par t icular ly function 4.0), 
create an edi t/val ida ti c r program which will control 
and implement the overall data filter process. 

The development methodology presented above is based 
upon a limited on- site review cf DCSPLAb'S operations, 
more comprehensive examination cf the 0C5PLAN3 environment 
(See Fhase I of the planning anc initial design process 
described in chapter four) will most likely uncover some 
additional requirements and necessary adjustments. 

Therefore a letaiied cn-site environmental review is in 
essential prerequisite to ar.v DCSPL.AJS data filter 
deve lopmcnt/implementatior. effort, especially one being 
undertaken by non-DC SFLAN5 personnel. 



VI. CONCLUSIONS AND R ECO lid SEDAT IO NS 



A. CONCLUSIONS 

DCSPLANS, NILPERCSN suffers from a data control problem 
common tc many small user groups in large data processing 
systems. It is unable to verify the correctness of input 
data obtained from sources outside its span of control. At 
the present time, DCSPLANS must rely almost exclusively on 
the the competence of its outside sources to guarantee the 
integrity of its input data. The situation is causing 
DCSPLANS' managers a great deal of concern. 

Top-level Army decision-makers use output from DCSPLANS' 
applications to formulate long-range personnel management 
policies. Thus, the adverse imract of erroneous input lata 
entering DCSPLANS' models can be far-reaching and extremely 
serious. Despite this fact, DCSPLANS' small size relative 
to the overall MILPSRCEN information processing system 
precludes it from strongly influencing the adoption of a 
system-wide validation capability. DCSPLANS must therefore 
develop and implement a "local" solution to its lata 
validation problem. 

DCSPLANS' models and their associate! input source files 
contain many of the same data items. Additionally, a 
variety of relationships exists among the input data. This 
situation demands that DCSPLANS' use a variety of validation 
techniques to insure the accuracy of data used by its 
models. Ir. addition to routine format checks, u series of 
reasonableness checks are also needed to guarantee that 
input is both complete anl consistent. Reasonableness 
checks are more complex than the format checks, and are, in 



fact, the real k ay tc insuring a truly integrated validation 
process (i.c., data elements, records and files are not only 
valid by t hens el ves, Lut also in relation to other relevant 
elements, records arid files). Cf course, validation of the 
legality and proper sequencing cf an input activity itself 
must precede the validity checks on the lata. 

An ideal validation tool for DCS PL A H 5 is the active data 
dictionary. Configured as a data filter, the dictionary 
provides a flexible, user-friencly, easily expandable 
validation system for a "small" user group. The data filter 
can be developed locally using the expertise currently 
available within DCS FLANS . Such local development allows 
the data filter system to be tailored precisely to DCS? LA '.'S' 
own validation needs. The data dictionary approach permits 
guick, easy adaptation of the data falter to changes in 
models and input data source files by simply alj is tin j 
dictionary metadata. Ho extensive validation program 
re-p writes will be required. Also, the use of a metadatabase 
as a single source of data for building IVS provides a 
ready-made mechanism for keeping the Z7H consistent. 

Lastly, an active data dictionary allows OCSPLANS to develop 
future data processing tools/ca pabili ties with relative ease 
and minimal investments of time ai.l money. 

Preliminary planning is crucial to DCJ PLAilS ’ successful 
development of the data filter. The overall DCSP1AHS data 
processing - environment must be understood, and -lata 
definition and associated validation requirements must be 
comprehensively examined and carefully locume.utei. Thorough 
acco rrp list ment of these first t wo phases of dcveloyoxer.t 
will provide a solid base for betr. preliminary an) letailed 
system design. Preliminary design should be accomplished 
through a functional decomposition of major system 
functions. These major functions must be derived free 
analysis of phase one and two results. 



and m us t sa t i s f v tie 



uchievenent of the specific goals and key objectives cf the 
DCSPLANS system. 



E. RECCfiMENDATIONS 

An effective DCS FLAWS approach to its data validation 
problem must key on the concep t s/designs presented in this 
thesis. It is recoin mended that: 

1. DCSPLAN 5 pursue an efficient "local" solution which 
can he tailored tc its specific needs, rather than 
await or attempt to influence the adoption of an 
organization- vide validation system. 

2. the local solution applied by DCSPLANS he an active 
data dictionary "data filter." 

3. DCSPLANS begin development with a prototype system 
that will validate Enlisted Master File (EM?) data 
for use in two models. This approach tests the 
system’s ability to differentiate between the degrees 
of validation required by different aolels using the 
same source data file, and also takes advantage cf 
the existing CT? ACS program (extracts relevant ELF 
data). The prototype should use a:, easy- to- prog ra a, 
easy-to-use relational database management system 
with a simple guerv language facility (similar tc 
LEASE II) . 

4. DCSPLANS appoint a small project team to oversee the 
data filter development. The team must conduct a 
thorough on-site review cf DCSPLANS environmental 
characteristics and data definition/validation 
criteria (Chapter c 'our) prior to revisions of the 
general design (Chapter Five) and subseguent coding. 
While detailed design anc coding can be conducted 
off-site (perhaps as a thesis project), the review of 
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environmental characteristics, data definition, an. 3 
validation criteria, must be accomplished at DCC?L-\H5 
by personnel familiar with DCSPLAMS operations. T.-.e 
checklists in chapter four provide comprehensive 
v.juidelir.es for such an examination. 






LIST OF REFERENCES 



Hanson, Owen, Design of Ccmguter Data Flips, Computer 
Science Press, T9H77 



Senn, James A.. Inf or m ati on Svsters In a na aere r. t , 
Wadsworth Publishing, 7931. 



ibid. 



ibid. 



Krcenkc, David M. , Database Pro cess ing , Scientific 
Research Associates, 1933. 



Appleton, Daniel S., "Business Rules: The hissing 

Link", Da tamaticn, v. .30, rubber 16, October 1934. 



Leong-Hong , E.w. and Piaaman, B.K., Pa t a 
Dict i ona ry /D ir ec tor v Systems, Wiley, 1932. 



ibid . 



ibid. 



ibid . 



Sprague, Ralph H. and Carlson, Eric P., Building 
Effective Decision Sugrort Sgstems, ?r entice- Halil 



Lecng-Fong, E.W. and Plagman, S.K., D at a 
Diet ion a rv/P i recto rv S vs terns, Wiley, 1932. 



Pressman, Reger S. , Software Engineering: A 

Practi one r 1 s Approach, « cGraw-3i.II, I? 7 )!. 



INITIAL DISTRIBUTION LIST 



Mo . 



1. Defense Technical Information Center 
Cameron Station, 

Alexandria, Virginia 22314 

2. lihrarv. Code 0 142 

Naval Postgraduate School 
Monterey, California 93043 

3. Department Chairnan, Code 54 
Department of Administrative Sciences 
Naval Postgraduate School 
Monterey, California 93943 

4. Dr. Dan Dolk t Cede 54DH 
Department or Administrative Sciences 
Naval Postgraduate School 
Monterey, California 93943 

5. Major (P) Robert M. DiBona 
IS Revere Road 
Monterey, California 93940 

6. Computer Teohnolcjv Curriculum Office 
Code a" 7 

Naval Postgraduate School 
Monterey, Calif ernia 93943 



Cop res 
2 



1 



1 

1 



57 









I "l 3 ( 



5 



ri? Om 

Thesis 

D 51 Dibona 

C ' 1 Use and design of an 

active data dictionary 
for local validation of 
input data. 



:20m 



Thesis 

D51 Dibona 

c.l Use and design of an 

active data dictionary 
for local validation of 
input data. 



